This article discusses the limitations of Large Language Models (LLMs) in classification tasks, focusing on their lack of uncertainty and the need for more accurate performance metrics. New benchmarks and a metric named OMNIACCURACY have been introduced to assess LLMs' capabilities in both scenarios with and without correct labels.
use of the LLM to perform next-token prediction, and then convert the predicted next token into a classification label.
Case study on measuring context relevance in retrieval-augmented generation systems using Ragas, TruLens, and DeepEval. Develop practical strategies to evaluate the accuracy and relevance of generated context.
AI agent helping write and fix code, running tests and iterating till code passes tests or matches designs. Uses OpenAI API and aims to make coding easier.
Micro Agent is an AI agent that assists with coding, helping with code generation and iteration processes. It's a focused agent that aims to write code based on provided test cases or design screenshots. It can work in tandem with OpenAI and Anthropic APIs for better visual matching. The agent is designed with a specific focus - creating a clear test case and providing feedback on code that helps improve the generated code. Installation requires Node.js v14 or later, and it can be installed globally using npm. To get started, running the agent in interactive mode is recommended. Micro Agent can work in both unit test matching mode and visual matching mode for coding assistance. It uses a multi-agent approach and connects with Figma for high fidelity design-to-code conversions. Configuration options are available via CLI or UI.
This is a hands-on guide with Python example code that walks through the deployment of an ML-based search API using a simple 3-step approach. The article provides a deployment strategy applicable to most machine learning solutions, and the example code is available on GitHub.
The highlighted articles cover a variety of topics, including algorithmic thinking for data scientists, outlier detection in time-series data, route optimization for visiting NFL teams, minimum vertex coloring problem solution, high-cardinality features, multilingual RAG (Rapidly-explainable AI) system development, fine-tuning smaller transformer models, long-form visual understanding, multimodal image-text models, the theoretical underpinnings of learning, data science stress management, and reinforcement learning.
A light-weight codebase that enables memory-efficient and performant finetuning of Mistral's models. It is based on LoRA, a training paradigm where most weights are frozen and only 1-2% additional weights in the form of low-rank matrix perturbations are trained.
Discusses the trends in Large Language Models (LLMs) architecture, including the rise of more GPU, more weights, more tokens, energy-efficient implementations, the role of LLM routers, and the need for better evaluation metrics, faster fine-tuning, and self-tuning.
This article discusses a method for automatically curating high-quality datasets for self-supervised pre-training of machine learning systems. The method involves successive and hierarchical applications of k-means on a large and diverse data repository to obtain clusters that distribute uniformly among data concepts, followed by a hierarchical, balanced sampling step from these clusters. The experiments on three different data domains show that features trained on the automatically curated datasets outperform those trained on uncurated data while being on par or better than ones trained on manually curated data.
This article is part of a series titled ‘LLMs from Scratch’, a complete guide to understanding and building Large Language Models (LLMs). In this article, we discuss the self-attention mechanism and how it is used by transformers to create rich and context-aware transformer embeddings.
The Self-Attention mechanism is used to add context to learned embeddings, which are vectors representing each word in the input sequence. The process involves the following steps:
1. Learned Embeddings: These are the initial vector representations of words, learned during the training phase. The weights matrix, storing the learned embeddings, is stored in the first linear layer of the Transformer architecture.
2. Positional Encoding: This step adds positional information to the learned embeddings. Positional information helps the model understand the order of the words in the input sequence, as transformers process all words in parallel, and without this information, they would lose the order of the words.
3. Self-Attention: The core of the Self-Attention mechanism is to update the learned embeddings with context from the surrounding words in the input sequence. This mechanism determines which words provide context to other words, and this contextual information is used to produce the final contextualized embeddings.